# Gravity Bilateral Model — Project Documentation

## Overview

This project implements a gravity model of bilateral demographic capital flows (Workplan Item 4.6) and integrates it with the clearing channels decomposition (Workplan Item 5.1) into a standalone paper. The core question: does the demographic *difference* between country pairs predict bilateral flow direction and magnitude?

## Directory Structure

```
gravity_bilateral/
├── data/
│   ├── raw/                    # Downloaded data
│   │   ├── cpis_bilateral.csv  # 8.8M rows, IMF PIP (CPIS)
│   │   ├── cdis_bilateral.csv  # 528K rows, IMF DIP (CDIS)
│   │   ├── cepii_geodist.csv   # 50K pairs, CEPII GeoDist
│   │   ├── dist_cepii.xls      # Raw CEPII file
│   │   ├── dist_cepii.zip      # CEPII download archive
│   │   ├── expanded_bond_yields.csv    # 35 countries, FRED OECD MEI
│   │   └── fred_additional_yields.csv  # 12 additional countries
│   └── processed/
│       └── bilateral_panel.csv # 509K obs, merged panel
├── scripts/
│   ├── phase1_download_data.py # Download CPIS, CDIS, CEPII; merge
│   ├── phase2_estimation.py    # Gravity estimation (models 2a-2f)
│   ├── phase3_cca_robustness.py # CCA, jackknife, margins
│   ├── phase4_expand_yields.py  # Download additional bond yields
│   ├── phase4b_reestimate_expanded.py # Re-estimate with 35-country yields
│   ├── phase5_projections.py   # Bilateral projections through 2050
│   ├── phase5_ppml.py          # PPML robustness estimation
│   └── wald_tests.py           # Joint significance tests
├── src/
│   ├── __init__.py
│   ├── model.py                # PanelGLS (copied from followup)
│   └── macro.py                # Macro utilities (copied from followup)
├── output/
│   └── tables/
│       ├── bilateral_coverage.csv
│       ├── gravity_results.csv          # Phase 2 models (2a-2f)
│       ├── gravity_results_expanded.csv # Phase 4 expanded models (2e-exp, 2f-exp)
│       ├── gravity_robustness.csv       # Phase 3 robustness
│       ├── jackknife_results.csv        # Phase 3 jackknife
│       ├── mediation_decomposition.csv  # Rate vs direct channel (original S1)
│       ├── mediation_decomposition_expanded.csv # Expanded S1 comparison
│       ├── s1_expanded_coefficients.csv # S1 original vs expanded
│       ├── wald_tests.csv              # Joint significance tests
│       ├── ppml_results.csv            # PPML robustness estimates
│       ├── bilateral_projections.csv   # Full projection panel (32 countries × 4 years)
│       ├── projection_summary_by_country.csv # Net reallocation pressure
│       └── top_bilateral_shifts_2050.csv     # Top 10 increases/decreases
├── paper/
│   ├── paper.md                # Draft paper (10 sections, 11 tables)
│   ├── paper.docx              # Word version (auto-generated)
│   ├── references.bib          # BibTeX bibliography (19 entries)
│   ├── convert_to_docx.py      # MD→DOCX converter
│   ├── tables/
│   └── figures/
└── docs/
    ├── PROJECT_DOCUMENTATION.md  # This file
    └── WORKPLAN.md               # Open items
```

## Data Sources

### 1. CPIS (Coordinated Portfolio Investment Survey)
- **IMF database**: `PIP`
- **Indicators**: `P_TOTINV_P_USD` (total), `P_F51_P_USD` (equity), `P_F3_P_USD` (debt)
- **Accounting entry**: `A` (assets = outward holdings)
- **Coverage**: 86 reporters, 216 partners, 2001-2024
- **Download**: Chunked by 15 reporter countries with 1.5s rate limiting
- **Size**: 8.8M raw rows → 326K wide-format bilateral-year obs

### 2. CDIS (Coordinated Direct Investment Survey)
- **IMF database**: `DIP`
- **Indicator**: `OTWD_D_NETAL_FALL_ALL` (outward DI net, all instruments)
- **Coverage**: 219 reporters, 218 partners, 2009-2024
- **Size**: 528K raw rows → 436K clean bilateral-year obs

### 3. CEPII GeoDist
- **Source**: http://www.cepii.fr/distance/dist_cepii.zip
- **Format**: ZIP containing XLS file (CSV URL is broken as of Feb 2026)
- **Variables**: dist_weighted (distw), dist_simple (dist), contiguity (contig), common_lang_official (comlang_off), common_lang_ethno (comlang_ethno), colonial_ties (colony), common_colonizer (comcol)
- **Coverage**: 224 countries, 49,952 directed pairs
- **Gotcha**: XLS uses "." for missing dist_weighted — must convert to numeric and fill from dist_simple

### 4. Demographics
- **Source**: `followup/data/processed/full_panel.csv`
- **Filter**: `year <= 2024` (panel includes projections through 2101)
- **Key columns**: iso3, year, Z_1, Z_2, Z_3, kaopen, ngdp_usd, ca_gdp, real_bond_10y_diff, log_lending_rate

## Constructed Variables

| Variable | Formula | Description |
|:---------|:--------|:------------|
| `dZ_k` | `Z_k_i - Z_k_j` | Bilateral demographic distance (k=1,2,3) |
| `dZ_k_x_kaopen_j` | `dZ_k × kaopen_j` | Demographic distance × destination openness |
| `log_dist` | `log(dist_weighted)` | Log population-weighted bilateral distance |
| `log_gdp_product` | `log(ngdp_usd_i × ngdp_usd_j)` | Log GDP product |
| `log_portfolio_total` | `log(portfolio_total)` where > 0 | Log bilateral portfolio position |
| `has_portfolio_total` | `1 if portfolio_total > 0` | Extensive margin indicator |
| `rate_diff_ij` | `real_bond_10y_diff_i - real_bond_10y_diff_j` | Bilateral interest rate differential |
| `pair_id` | `reporter + "_" + partner` | Panel entity for PanelGLS |

## Estimation Results Summary

### Phase 2: Main gravity models

| Model | R² | N obs | N pairs | Key finding |
|:------|:---|:------|:--------|:------------|
| 2a: Baseline gravity | 0.232 | 104,965 | 7,885 | Standard gravity signs confirmed |
| 2b: + Demographics | 0.240 | 104,965 | 7,885 | ΔZ₁=3.68***, ΔZ₂=-0.49***, ΔZ₃=0.019*** |
| 2c: + KAOPEN interactions | 0.288 | 95,653 | 7,415 | All ΔZ×KAOPEN_j significant (p<0.023) |
| 2d: Portfolio equity | 0.205 | 76,247 | 5,890 | ΔZ₁=1.73** only; Z₂,Z₃ insig |
| 2d: Portfolio debt | 0.199 | 85,859 | 7,027 | ΔZ₁=3.88***, all significant |
| 2d: FDI outward | 0.299 | 109,647 | 11,706 | ALL demographics insig (p>0.37) |
| 2e: + Price controls | 0.579 | 11,473 | 506 | rate_diff_ij NOT sig; small OECD sample |
| 2f: Fitted rate diff (Carvalho) | 0.237 | 104,965 | 7,885 | fitted_rate_diff_ij=-0.161*** (p<0.001) |

### Two-Stage Rate Mediation Decomposition

| Metric | Value |
|:-------|:------|
| Baseline R² (gravity only) | 0.2315 |
| R² with full ΔZ | 0.2401 (+0.0086) |
| R² with fitted Δr̂ only | 0.2365 (+0.0050) |
| Rate-mediated share | 58.3% |
| Direct/other channel share | 41.7% |
| cf. multilateral (5.1) | rates=9%, direct=89% |

S1 coefficients used (Z-only → real_bond_10y_diff, 23 OECD countries, N=689):
- Z₁: 16.320 (p=0.12), Z₂: -2.075 (p=0.14), Z₃: 0.072 (p=0.18)

### Phase 4: Expanded Bond Yield Coverage

| Metric | Original (23 ctry) | Expanded (35 ctry) |
|:-------|:-------------------|:-------------------|
| Portfolio obs with rate diff | 11,473 | 24,431 |
| Pairs with rate diff | 506 | 1,180 |
| Model 2e rate_diff p-value | 0.249 | 0.102 |
| S1 Z₁ p-value | 0.385 | 0.198 |
| S1 R² | 0.019 | 0.006 |

Added countries: POL, CZE, HUN, ISR, CHL, ZAF, ISL, LUX, SVK, SVN, RUS, CHN
Source: FRED IRLTLT01 series (OECD MEI long-term interest rates)
Key finding: rate differential still NOT significant even with 2.3× more data

### Phase 3: Robustness

| Test | Result |
|:-----|:-------|
| Excl CCA pairs | Coefficients change <7%, all remain p<0.001 |
| Excl CCA non-commodity | Coefficients change <1% |
| Jackknife (11 regions) | 9/11 leave ΔZ significant; East Asia and MENA sensitive |
| Extensive margin (logit) | All ΔZ significant (p<0.001), pseudo-R²=0.262 |
| Intensive margin (GLS) | Same as full sample |

### Clearing channels (from 5.1)

| Channel | Full sample | AE | EM |
|:--------|:-----------|:---|:---|
| REER | 3.5% | 23.4% | 0.2% |
| Fiscal | -1.8% | -0.1% | -1.0% |
| Interest rates | 9.0% | 9.0% | 9.0% |
| Direct/residual | 89.3% | 67.6% | 91.8% |

## Relationship to Other Projects

- **Followup paper** (`followup/`): Multilateral CA model on 140 countries. This gravity paper provides bilateral confirmation.
- **Clearing channels** (`clearing_channels/`): Channel decomposition integrated as Section 6 of this paper.
- **Original paper** (`paper/`): 69-country baseline. This gravity paper extends identification strategy.

## Technical Notes

### IMF API (imfp package)
- PIP database uses ISO3 country codes (same as our panel)
- DIP database also uses ISO3
- Bilateral data is large: use chunk_size=15, sleep 1.5s between chunks
- Full CPIS download takes ~30 minutes (3 indicators × 18 chunks)
- Country codes starting with TX, GX, W0, W1 are aggregates — filter out

### PanelGLS with pair entities
- Entity IDs are pair strings ("USA_JPN", "DEU_CHN", etc.)
- AR(1) correction operates within pairs across years
- High ρ (~0.94-0.95) reflects persistent bilateral positions
- Time IDs are integer years

### CEPII data quality
- dist_weighted column in XLS has "." for missing values
- Must call `pd.to_numeric(errors='coerce')` before using
- Use dist_simple as fallback where dist_weighted is NaN
- Self-pairs (iso_o == iso_d) must be dropped
